Norwegian University of Science and Technology Technical Report IDI-TR-8/2010 Exploiting Time-based Synonyms in Searching Document Archives
نویسندگان
چکیده
Recently a large number of easily accessible information resources have become available. To increase search quality, document creation time can be taken into account in order to increase precision, and query expansion of named entities can be employed in order to increase recall. A peculiarity of named entities compared to other vocabulary terms is that they are very dynamic in appearance, and synonym relationships between terms changes with time. In this paper, we present an approach to extract synonyms of named entities over time from the whole history of Wikipedia. In addition, we will use their temporal patterns as a feature in ranking and classifying them into two types, i.e., time-independent or time-dependent. Time-independent synonyms are invariant to time, while time-dependent synonyms are relevant to a particular time period, i.e., the synonym relation changes over time. Further, we describe how to make use of both types of synonyms in order to increase the retrieval effectiveness (precision and recall), i.e., query expansion with time-independent synonyms for an ordinary search, and query expansion with timedependent synonyms for a search wrt. temporal criteria. Finally, through an evaluation based on TREC collections we demonstrate how retrieval performance of queries consisting of named entity can be improved using our approach.
منابع مشابه
Norwegian University of Science and Technology Technical report IDI-TR-11/2002 Supporting Temporal Text-Containment Queries
In temporal document databases and temporal XML databases, temporal text-containment queries are a potential performance bottleneck. In this paper we describe how to manage documents and index structures in such databases in way that makes temporal text-containment querying feasible. We describe and discuss different index structures that can improve such queries. Three of the alternatives have...
متن کاملNorwegian University of Science and Technology Technical Report IDI-TR-09/2007 Semantic-Based Association Rule Mining of Temporal Document Collections
In many contexts today we have documents available in a number of versions. In addition to explicit knowledge that can be queried/searched in documents, these documents also contain implicit knowledge that can be found by text mining. In this paper we will study association rule mining of temporal document collections, and extend our previous work by 1) performing mining based on semantics as w...
متن کاملNorwegian University of Science and Technology Technical report IDI-TR-X/2002, last revised: 2002-09-02 V2: A Database Approach to Temporal Document Management
The advent of large amounts of data on the web has closed the gap between the document storage and database communities. In this paper, this work is continued by the description of the foundations for temporal document databases. We describe the V2 temporal document database, which supports storage, retrieval, and querying of temporal documents. We describe functionality and operations/operator...
متن کاملNorwegian University of Science and Technology Technical report IDI-TR-10/2002 Design, Implementation, and Performance of the V2 Temporal Document Database System
The advent of large amounts of data on the web has closed the gap between the document storage and the database communities. In this paper, this work is continued by the description of the foundations for temporal document databases. We describe functionality and operations/operators to be supported by such systems, and more specifically we describe the architecture for management of temporal d...
متن کاملNorwegian University of Science and Technology Technical Report IDI-TR-1/2003 Algorithms for Granularity Reduction in Temporal Document Databases
With rapidly decreasing storage costs temporal document databases is now a viable solution in many contexts. However, storing an ever growing database can still be too costly, and as a consequence it is desirable to be able to physically delete old versions. Traditionally, this has been performed by an operation called vacuuming, where the oldest versions are physically deleted (or migrated fro...
متن کامل